Portable Techniques to Find Effective Memory Hierarchy Parameters
نویسندگان
چکیده
Application performance on modern microprocessors depends heavily on performance related characteristics of the underlying architecture. To achieve the best performance, an application must be tuned to both the target-processor family and, in many cases, to the specific model, as memory-hierarchy parameters vary in important ways between models. Manual tuning is too inefficient to be practical; we need compilers that perform model-specific tuning automatically. To make such tuning practical, we need techniques that can automatically discern the critical performance parameters of a new computer system. While some of these parameters can be found in manuals, many of them cannot. To further complicate matters, compiler-based optimization should target the system’s behavior rather than its hardware limits. Effective cache capacities, in particular, can be smaller than the hardware limits for a number of reasons, such as sharing between cores or between instruction and data caches. Physical address mapping can also reduce the effective cache capacity. To address these challenges, we have developed a suite of portable tools that derive many of the effective parameters of the memory hierarchy. Our work builds on a long line of prior art that uses micro-benchmarks to analyze the memory system. We separate the design of a reference string that elicits a specific behavior from the analysis that interprets that behavior. We present a novel set of reference strings and a new robust approach to analyzing the results. We present experimental validation on a collection of 20 processors.
منابع مشابه
A Hierarchy Topology Design Using a Hybrid Evolutionary Algorithm in Wireless Sensor Networks
Wireless sensor network a powerful network contains many wireless sensors with limited power resource, data processing, and transmission abilities. Wireless sensor capabilities including computational capacity, radio power, and memory capabilities are much limited. Moreover, to design a hierarchy topology, in addition to energy optimization, find an optimum clusters number and best location of ...
متن کاملData Reuse Exploration Techniques for Loop-Dominated Application
Efficient exploitation of temporal locality in the memory accesses on array signals can have a very large impact on the power consumption in embedded data dominated applications. The effective use of an optimized custom memory hierarchy or a customized software controlled mapping on a predefined hierarchy, is crucial for this. Only recently effective systematic techniques to deal with this spec...
متن کاملAutomatic memory hierarchy characterization
As the gap between memory speed and processor speed grows, program transformations to improve the performance of the memory system have become increasingly important. To understand and optimize memory performance, researchers and practitioners in performance analysis and compiler design require a detailed understanding of the memory hierarchy of the target computer system. Unfortunately, accura...
متن کاملSource Code Loop Transformations for Memory Hierarchy Optimizations
Portable or embedded systems allow complex applications like multimedia today. These memory intensive applications and submicronic technologies have made the power consumption criterion crucial. We propose new source to source transformations thanks to which we can optimize the behavior of these applications by reducing the amount of needed physical memory and hence the associated power consump...
متن کاملBlackjackBench: Portable Hardware Characterization with Automated Results' Analysis
DARPA’s AACE project aimed to develop Architecture Aware Compiler Environments. Such a compiler automatically characterizes the targetted hardware and optimizes the application codes accordingly. We present the BlackjackBench suite, a collection of portable micro-benchmarks that automate system characterization, plus statistical analysis techniques for interpreting the results. The BlackjackBen...
متن کامل